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Abstract : 

Asymptotic properties of a dimension-robust dependence measure arc inves- 
tigated. It is related to those used in independence tests, but is derivable, thus 
suitable for independent component analysis. An adjustable kernel allows to 
accelerate the convergence of the estimator without affecting the bias. 
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1 Introduction 

Since the 1950s, there has been a continuous research activity over the definition 
of measures of dependence, that is, positive functions that are equal to zero if 
and only if the variables are independent. The necessity of such measures first 
appeared in the construction of independence tests. Hoeffding proposed 
to define an independence test by comparing the joint cumulative distribution 
function and the product of the marginal cumulative distribution functions. 
Then, in the 1970s several authors, including Rosenblatt ^2Jj Blum ct al. [SJ, 
Feuerverger [S], have studied independence tests defined by comparing the joint 
density and the product of the marginal densities, or comparing the joint char- 
acteristic functions and the product of the marginal characteristic functions. 
But, in general, these tests are constructed to control the independence of only 
two variables, and are unsuitable in higher dimensions because of the curse of 
dimension in estimating the density. Recently, measures of dependence have 
received renewed interest as they play a crucial role in obtaining a procedure 
for independent component analysis (ICA) EH . This analysis aims at find- 
ing a transformation (usually linear) of a vector of observations, such that the 
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transformed vector has independent components. To this end one minimizes 
a measure of the dependence between the transformed components. In order 
to employ efficient minimisation procedures, such a measure has to be diffcr- 
entiablc with respect to the transformation, which is not the case for measures 
based on order statistics, for example. 

In this letter, we study the dependence measure called quadratic dependence, 
introduced in £Q and whose definition involves an adjustable kernel function. 
In order to relate this quadratic dependence with other existing dependence 
measures (e.g. the ones introduced by Chen et al. @|, Eriksson et al. [7] and 
Kankainen we derive two different expressions for it (Section The 

first one is based on the comparison of the joint characteristic function and 
the product of the marginal characteristic functions, which allows us to derive 
asymptotic properties of the estimator. The second one is based on its decom- 
position as U-statistics, and allows us to prove asymptotic normality and to 
gain insight on the crucial choice for the effect of the bandwidth of the kernel 
(Section I3J). In Section [21 we apply the quadratic dependence in the context of 
independence tests. 

2 Definitions of the quadractic dependence mea- 
sure and estimations 



We introduce a dependence measure which is continuous and derivable, so as 
to allow convenient minimisation procedures. Let K. be a summable function 
such that its Fourier transform is different from zero almost everywhere. Then, 
for any random variables Yy, . . . , Yk, the equality of E Hfc=i ^ (Vk ~ Yk) an d 

Y\k=i E [K. (yk — Yk)] for all vectors (j/i, . . . , yx) in M. K is equivalent to the in- 
dependence of Yi, . . . , Yk EH- Thus, a dependence measure can be obtained 
by associating this characterization of dependence with a quadratic measure, as 
described below. This dependence measure is called the quadratic dependence 
and was first introduced by [Q. 



2.1 A kernel-based characterisation of independence 

Definition 2.1 (Quadratic dependence) Let JC be a square summable ker- 
nel junction with Fourier transform different from zero almost everywhere. For 
a set of K random variables Y\,...,Yk (with finite variance), we define the 
quadratic measure of their (mutual) dependence as 



q(Yi,...,y, 

where Y = (Yi, ■ • • ,Y K ) 
Dy(yi, ...,y K ) = E 



K ) 



K 



Dy(vi, ■ ■ ■ ,?/if) 2 rfyi ■ ..dy K - 



T JC h (yk - — 



K 



fCh [ Vk - — 

°~Y hJ 



(1) 
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where o~y k is a scale factor, that is, a positive functional of the distribution of 
Yk such that o \y k = \\\uy h , for all real constant X, and JCh = JC(x/h)/h. 

First of all, the following lemma establishes that the function Q is a dependence 
measure. 

Lemma 2.1 For any random variables Y%, . . . , Yk, Q(Yi, . . . , Yk) = if and 
only if the random variables Y\ , . . . , Yk are independent. 

The proof of lemma l2~Tl follows from the continuity of the characteristic functions 
and the equivalent expression of Q in terms of the characteristic functions as 
stated in the lemma l2~2l 

Lemma 2.2 Let us define i/jy the joint characteristic function of Y\, . . . , Yk, 
ipY k the characteristic function ofY^, and ip/c the Fourier transform of K. Let 
Z?y oe difference between the joint characteristic function and the product 
of the marginal characteristic functions : 

K 

D^ivi, ■■■,Vk) = ^y(Vi, ■■ -,Vk) ~ n ^Y k (yk) (2) 

k=l 

where Y = \Y\, ■ ■ ■ ,Yk] t Then, the quadratic dependence Q can be expressed 
as a weighted average of | D Y 1 2 •' 

Q(Y u ...,YK) = Ufl | | Vv(y)| 2 ^ • • ■ dy K . (3) 

J k=l " j7F 

The proof of this lemma is given in appendix^] Also, it is easily verified from 
that the quadratic dependence is invariant by translation and by multiplication 
by a scalar. 

The measure © has been considered by Kankainen , Eriksson et al. [7] and 
Feuerverger [S], but only in the particular case where K. is a Gaussian kernel 
and without a scaling factor. It can also be seen as a generalisation of the 
measure defined by Rosenblatt ^21- Indeed, when the bandwidth tends to zero 
and under usual hypotheses for the density and the kernel, Q is equal to the 
quadratic measure of the difference between the joint density and the product 
of the marginal densities. 

Lemma 2.3 Let us define py the joint density ofY\, . . . , Yk andpy k the density 
ofYfr. Let us assume that the kernel JC is a Parzen- Rosenblatt kernel, that is 
limi^i^oo |x|/C(x) = 0. Then, for all y where the joint density is continuous, 



lim E 



K 



\KhiVk -Y k /(TY k ) 



,fc=l 



K 



= Py(2/i/°Vi> ■ ■ ■ i Vk/vyk)/ II CTfc- 



fe=i 



And for all yk where the marginal density py k is continuous, 
Urn E [K h {y k - Y k /a Yk )] = p Yk {y k / a Yk ) / a Yk 

h—>0 
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Moreover, ifpY is continuously differentiable up to the second order with bounded 
derivatives and the first moment of IC and IC 2 exist, then 

K 

\im Q(Y X ,...,Y K ) = 1/2 / \ P Y(y)-Y[pY k (yk)\ 2 dy. 

J fc=i 

This result is proved in appendix iBl 

Chen and Bickel 0] studied the minimum of an estimator of J3J in the context of 
linear ICA and proved its consistency independently of the choice of a kernel 1 . 
They point however that the choice of a kernel and especially, variations of its 
bandwidth, can change dramatically the variance and convergence in moment 
of the estimators. One purpose of the present study is to shed some light on 
the influence of the bandwidth on the behaviour of the quadratic dependence 
measure in the context of independence tests. 



2.2 Kernel trick 



The quadratic dependence as rewritten in (|SJ| is not easy to estimate because 
of the multiple integration. The following lemma derives a formula for the 
quadratic dependence from which a convenient estimator arises. The trick em- 
ployed for this is specific to this measure, and is a first step to address the 
problem of the curse of dimension. 

Lemma 2.4 Let IC2 be the convolution of IC with its mirror, i.e. K,2{u) = 
f IC(u + v)K,(v)dv. For a set of K random variables Y\, . . . ,Yk (with finite 
variance), the quadratic measure of their (mutual) dependence is equivalent to 



K 



Q(Y 1 ,...,Y K ) 



where 



= - £[^ Y (Y)] 



jE[n Yk (Y k )]-2E 



k=l 



K 

Utty, 

k=l 



>(,4) 



Trv(y) 



K 



2.J1 



Vi - Yi(n) 



E 



IC 



1,h 



y k - Y k (n) 



and o~Y k is a scale factor (see definition ^. 1)) . 

The proof of this lemma is given in appendix [C] This lemma shows that Q 
depends on IC only indirectly through K-2, therefore we can choose IC2 directly 
without ever considering JC. For consistency with its definition, IC2 must be 
choosen such that its Fourier transform is a positive summable even function, 
since its Fourier transform corresponds to IV'Jtl 2 where ipK is the Fourier trans- 
form of a real square summable function. Moreover, the Fourier transform of 
IC2 has to be different from zero almost everywhere. 
Some possible choices for /C 2 are (denoting -0^ 2 the Fourier transform of IC2) 

1 Note also that, in their definition, there is no scaling factor in the weight function, and 
therefore, no invariance by multiplication by a factor 
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Figure 1: Plots of possible kernels and their Fourier transform, , Gaussian. 

• ■ ■ , square Cauchy. — , the negative of the second derivative of the square 
Cauchy kernel, (a) is the plot of the kernels and (b) is the plot of their Fourier 
transform. 

: the Gaussian kernel, K,2{x) = e~ x2 , ipic 2 (t) = \pne~ l / 4 , the square Cauchy 
kernel, Ki{x) = 1/(1 + x 2 ) 2 , ip/c 2 (t) = n {\t\ + l)e — , the negative of the sec- 
ond derivative of the square Cauchy kernel, £2(2;) — — (20x 2 — 4)/(l + x 2 ) 4 , 
^ 2 (i)=4t 2 7r 3 (|i| + l)e-l'l. 

The first two kernels correspond to density kernels (after normalizing) and differ 
only by their tail behaviour. The third kernel is not a density kernel and takes 
negative values (see figure 0. One may note that the kernel IC2 is related to 
Mercer kernels which are used especially in Support Vector Machine |14|. 

2.3 Estimation 

An estimator of Q is defined using expression Q. In the sequel, the observed 
data will be denoted by Yfc(n), n = 1, . . . , N; k = 1, . . . , K; N being the sample 
size and the scaling factor a = {<7y 1 , ■ ■ ■ , cry K ) is supposed to be known, that is 
independent of the sample. Let us remark that involves only the expectation 
operator E, thus a natural estimator of Q can be obtained simply by replacing 
this operator with the sample average E, defined as E<j>(Y) = 2n=i ^(Y(n))/JV, 
for any function <j) of K (real) variables. Thus, an estimator of Q will be, 

Q(Yi,...,Y K ) = UE^(Y) + ]lE9 Yk (Y k )-2E]j9 Yk (Y k )y (5) 
I fe=i fe=i J 
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where 



n—l i— 

l N 



/ yi - i*(n) 



n—l i—l x 



n—l v K 

Note that the computational cost of the estimator Q is of order KN 2 . 
As the exact expression of Q is given in terms of the characteristic functions, 
the estimator Q can alternatively be rewritten in terms of the estimators of the 
characteristic functions. 

Lemma 2.5 

-prA" r K K 

Q(Y 1 ,...,Y K ) = l[ g\7 k / II fc,>y fc y*)|^My) - II MVk)\ 2 d yi . ..dytf) 

^ ^1 J k=l fc=l 



^r(yi, ■••iVk) 



E 



K 

Y[ e lykYk 

k=l 



N K 



N 



n=l k=l 



1 N 



n=l 



The proof is given in appendix IT)l 

3 Asymptotic properties 

Having proposed a convenient estimator of the quadratic dependence, the objec- 
tive is now to show asymptotic properties of the estimator Q, in order to control 
its efficiency in independence tests. First, we note that this estimator Q can be 
expressed in terms of U-statistics. The asymptotic behaviour of the estimator 
Q under the hypothesis of dependence of the random variables is given first. 
Then, using U-statistics, the variance of the estimator Q is computed. Finally, 
it is shown that the estimator Q converges to a Gaussian distribution. 

3.1 Convergence under the hypothesis of dependence 

Lemma 3.1 Suppose that the Fourier transform of JC2 is positive, different 
from zero almost everywhere. Then, under the hypothesis of the dependence 
of the random variables Yi, . . . , Yk, hmjv— >+oo Q(Xi, . . . , Yk ) > a.s., for any 
cumulative distribution function o/Y. 

The proof is given in appendix IE1 



G 



3.2 Bias and variance 

Unlike what is the case for the estimation of the density, the estimator Q is 
unbiased, that is E[Q] = Q. This comes from Hoeffding |9]|. This result is 
completely independent of the choice of the kernel. Consequently, the bandwidth 
does not have to assume a specific dependence on the sample size in order to 
achieve convergence in mean. In particular, the bandwidth does not have to 
vanish as the sample size tends to infinity. Moreover, the convergence of the 
estimator of Q does not suffer from the problem of curse of dimensionality. 
Also we show that the variance of the estimator Q goes to zero for any fixed 
bandwidth. More precisely, the exact development of the variance is given below, 
following the development of Hoeffding [S], we deduce the dominant terms in 
the expansion of the variance of Q, (the proof is given in appendix |UJ 



4 4K 2 {K + lf 2{K+l) 
-£ (u) + — £ (2a) + 4 £ (33) - 4 . 

_ 4 2^±l) S(23)+2 ^ E(i2)+o(W) 



i K 

£(i2) = S (21) = -5^ J E7[JJ J E7(7r n (r fc ))7ry 1 (^)7rYCY)]-M a 



1=1 k=jtl 



£(13) — £(31) 



— L-i^Y) f[ n Yk (Y k )} - 6,6s 

K + 1 Li 

^^k(Y)iy,«)] 
with n Yl (yi) = E n fc ^i K Yk {Y k )K 2 ,h 

K K 

£(23) = £(32) = K(R+ ^ J2Ii E ^™ E ^ Y l) l[*Y k (Y k )] - ^3 

^ ' k=l l+k k=l 



Yi-yi 
<y Y , 



£(22) 



fe=l l^k 
1 - 

K(K + 1) E H E MYk)}E[ir Yl (Yi)KY m (Y m )} 



1 K 

= K2 E H E M Y k)] II skWPhWR(n,) ] -el 

l,m—l k^l k^m 
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K 1 K 



V ' fc=l V ' 2,m=l 



None of these quantities depend on TV, but they are dependent on the choice of 
the kernel and its bandwidth, and on the distribution of the observations. 

3.3 Asymptotic Gaussian distribution and hypothesis test 

The quadratic dependence measure studied in this paper provides us with an 
estimator for the evaluation of the dependence between variables. In the fol- 
lowing, we construct a hypothesis test of independence based on the quadratic 
dependence measure. The asymptotic laws under the hypotheses of indepen- 
dence (denoted H ) and dependence (denoted Hi) are deduced. Finally, it is 
shown that this hypothesis test of independence is consistent for any choice of 
the bandwidth. 

• Law under the hypothesis of independence (denoted H ): 

The estimator NQ follows asymptotically a law of 7X 2 (/3) where 7 and j3 are 
7 = V 1 /2E 1 and (3 = 2E 2 /Vi, where E x = limj\r->ac NE[Q] under H , and 
V\ = Mmpf^oo Nvar(Q) under Ho. It holds that 

K K K . K 

Ei = H ^fcto^-n^M-E* / K lh(x)dx-E{Tr Yk (Y k )}) JJ E[ir Yl (Y)] 

fc=l J fc=l fe=l J l=l,l^k 

and 

K K K 

Vx = 2j] E[n Yk (r fc )] 2 - 4 [] E[ir Yk {Y k f ] + 4 J] E[n Yk (Y k )] 

k=l fe=l k=l 

K K 

+ 2j2(E[n Yk (Y k )}-E[n Yk (Y k )} 2 ) [] ^K(il)] 2 
fe=i i=\.i^k 

K K 

- 4j2(Eln Yk (Yk)]-ElK Yk (Y k n J] E[n Y (Yf} 

k=l I =ld=£k 

K K 

+ 2 E E (E[n Yk (Y k )] 2 E[n Ym (Y m )] 2 -2E[n Yk (Y k ) 2 }E[n Ym (Y m )} 2 

A*— 1 m=X,m^k 

K 

+ E[n Yk (Y k ) 2 ]E[n Ym (Y m ) 2 }) J] E[n Yl (Y)] 

l=l,l=£k,m 

This result is due to Kankainen 

• Law under the hypothesis of dependence (denoted Hi): 
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\N(Q — Q) follows asymptotically a normal law with zero mean and £ variance, 
where £ is, 

E = 4S (n) +4K 2 E (22) +4(A'+l) 2 E ( 3 3) -8(/v+l)S] (1 3 ) -8X(^+l)S] (2 3 ) +8KE (12) 

with £ the variance-covariancc matrix of the corresponding U-statistics, which 
depends on ZC 2 and h. 

Lemma 3.2 The independence test defined above is consistent for any choice 
of the bandwidth : Given a, the level of significance, we define q a the smallest 
number satisfying the inequality Ph (Q > la) = 1 — F~tx 2 {0) (Nq a ) < ct. 
Then, the power of the test 1 — Phi(Q < 9a) tends to 1 as N goes to infinity. 
In addition, the power of the test admits a lower bound : 

-> var(O) 
1 - P Hl (Q < fe) = PaAQ > 9a) > 1 (7) 

The proof is given in appendix |HJ Note that the lower bound in |Q) is not sharp 
as is illustrated in figure [3 




Figure 2: Variance (decreasing) and type II error (increasing) for different sam- 
ple sizes. — , N = 100, , N = 200, • • • , N = 400, - ■ -, N = 800, - -, 

N = 1600. 



3.4 Convergence rate and choice of the bandwidth 

As the asymptotic bias of the estimator tends to zero when the size of the sample 
N goes to infinity without any constraint on the bandwidth, it is not necessary 
to choose the bandwidth so as to make a tradeoff between the bias and the vari- 
ance. As a result, the bandwidth can rather be adjusted in a tradeoff between 
the minimisation of the variance and the minimisation of the type II error of 
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the test. 

With the expression of the variance given above, it is clear that when the band- 
width h of the kernel increases, the variance of the estimator will decrease. But, 
as the bandwidth h increases, the type II error of the test is expected to increase. 
Indeed, the asymptotic power of the test is defined by 

i-^ ( e< fc) -i-.pfe^?) 

where for a given a, q a verifies Pn„(Q > q a ) = 1 — F lx 2^(Nq a ). Figure [3 
illustrates the behaviour of optimal choices of the bandwidth depending on the 
size of the sample. On this figure, we observe that for a given bandwidth the 
convergence of the variance of the estimator is slow (of order 1/N). But, if the 
bandwidth is adjusted to make a tradeoff between the variance and the power of 
the test, the convergence rate is tremendously increased. Future work has to be 
done to quantify this increase and to propose a computational rule to optimize 
the bandwidth. 
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4 Conclusion 

The quadratic dependence measure is revisited and its asymptotic properties 
are demonstrated. The convergence rate in terms of the variance is of order 
1/N, and the power of the test defined by this measure converges to one with a 
rate of 1/N at least, N being the sample size. 

The introduction of a kernel frame in the definition of the quadratic dependence 
measure enables us to propose an efficient estimator of computational cost of 
order KN 2 , with K the dimension of the problem. This kernel is adjusted with 
a bandwidth whose choice does not affect the bias, which differs with the case 
of the estimation of density. Because of this property, the bandwidth can be 
chosen in terms of the sample size N , so as to increase the convergence rate 
in terms of the variance and of the power of the test rather than debiasing the 
estimator. 
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Appendix 

A Proof of lemma 12.21 

After a change in the integral variable, Q is written as, 



Q(Y 1 ,...,Y K ) 



nK 
k=l °Y h 



K 



.k=l 



Uk - Y k 



K 



fe=i 



Uk - Yk 



dyx . ..dy K - 



Let us denote a new kernel, 



lCh.a Yl (x) = Kh ( 

Von 



Q(Y U ...,Y } 



then, Q is expressed as 
1 



K 



K 



K 



* dF Y (yi, ■ ■ . , Vk) ~ Y[ Kh,<TY k * dF Y k {Vk) \ dy. 



k=l 



where dy := dy\ . . . dyK- Moreover, the Fourier transform of (j/i, . . . , yx ) i— > 
Ilf=i ^h,^ * dF Y{yi, ■ ■ ■ , ) and i-> A^ iCrVfc * dFy*, (y k ) are respectively 
equal to 



K 



(yi, ■ ■ ■ , Vk) i-* J| cry fc V'K: h {a Yk yk)^(yi, --^Vk) and y fc i-> <ry fc ^x: h {^Y h yk)^Y k (Vk ) 



Then using the Parscval's formula, that is the Fourier transform is unitary, the 
lemma is proved. 



B Proof of lemma 12.3 



The lemma ET^l comes directly from Bochncr's lemma applied to the convolution 
of the kernel and the density function. The limit behaviour of Q when h tends 
to zero is proved by applying a Taylor development to the density. The proof 
is rather computational, and we will only give the proof for, 



lim 



K(v)p Yk (hv + t)dv) ~{ P Y k {t)) 



dt 



Indeed, by applying the triangular inequality and the Cauchy-Schwarz in- 
equality, the desired quantity is bounded by, hM p ^ J \v\IC 2 (v)dv, which proves 
the limit. The other terms behave similarly. 
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C Proof of lemma 12.41 

This lemma is proved by dcvclopping the square term in the definition of Q see 
definition 12.11 



fyfei, ■■•iVk) 

K 



E 



~[fch ( Vk 



Yk_ 

°Y k 



K 



k=l 



lCh\yk- 



Yk_ 



( / I] (vk - —) dF Y (u h . ..,u K ) - J] [Zh (yk - —) dF Yk (u k ) 



A 

fe=i 



K 



— )lC h (y k - — ) dF Y (u)dF Y (v) 



— — ) JCh (yk — ) dF Yk (u k )dF Yk (vk) 

°Y k \ °Y k 



- 2 / n / Kh (» - ^) (** - ^ (u) 



So as to apply the Fubini's theorem, the properties of convergence of integrals 
have to be checked : 



A 

n 

k=l 



ICh [ Vk - — ) )Ch[yk- — 

°Y k / V OV* 



< 



< 



A" 

n 



£fc ( j/fe - — ] /C/j ( y k - — 
°Y k J V cry* 

A" 



dF Y (u)dF Y (v)dy 

dydF y (u)dF Y (v) 



y y dF Y (u)dF Y ( V ) n y i*^ 2 < oo 

then, for all fc = 1, ... , A', 



fc=i 1 



and 



A 

n 

fc=i 



lCh[y k - — ) /C/j ( y fc - — 



lCh[yk- — ))Ch[yk - 



dF Yk {uk)dF Yk (vk)dy < oo 



dF n (%)ciF Y (u)dY < oo 



This concludes the proof of the lemma by applying Fubini's theorem. 
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D Proof of lemma 12.51 



This lemma is proved in the same way as lemma 12.21 by using the Parseval's 
formula and the following equality : 



Q(Y u ...,Y K ) = U ^ a K Yk [ f[^ 2 AvY k yk)\$Y(y)-f[$YM\ 2 dyi 

k=i 

ipK 2 , h (yk)dy k 
^K 2 , h {yk)dy k 



K 



K 



N 2 



N N K 

eeii 

m— 1 n—1 k— 1 

A N N 

n e e 

k—l m—1 n—1 



N A N 
m— 1 k—l n—1 



k=l 



Y k M-Y k ( m ) 



E Proof of lemma 13.11 

As the variables Yi, . . . , Yr- are not independent, there exists yi, . . . , yx such 
that the following inequality is true : 

A 

ipy(yi, ■ -,Vk) n ^kiVk) 

k=l 



Then there exists a bounded open U having positive Lebesgue measure such 



that 



K 



inf ■ -,Vk) - TT ipY k (yk)\ > 

yeU t=i 

As the Fourier transform of K-2 is different form zero allmost everywhere, we 
obtain the following inequality : 



/ \iPy(vi, ■ -^vk) - n ipY k (yk)\ 2 Y[ ^K. 2 , h {yk)dy > 

J U 1.1 1 1 

From Csorgo 



fc=i 



fe=i 



and 



sup 

yes 



A 



A' 



sup ] {Vk) - ]J ^v fe 



y^ B 1 fe =i 



fe=i 



0, N -> +00 



^ 0, AT -> +00 
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for any bounded set B, and in particular for B = U. 

Thus, as the Fourier transforms of the kernel are summable, 

f - I K 

lim / ^ Y (y) - ipy(y) TT ^K 2<h {Vk)dy = a.s. 

i 

r K if if 

at^+oo / 1 n ^= w ~ n ^Y k {vk) \ n v^^^y = o a. s . 

-> °°Ju k=l k=l k=1 

Let us now remark that, 

if if 



Ju k=l k=l 

r K K 

/ \iPy(vi, ■■ -,vk) - IJ ^Y k {yk)\ J| ^K, 2 , h {yk)dy 
Ju k=i fc=i 

- / ^v(y) -Vv(y) I| ipK 2ih {yk)dy 

J u i — i 



fe=i 

if K K 



1 1 n^( yfe )~ n^^l n^,h(ffc) d y 



fc=l fc=l fe=l 

which leads us to : 

if if 

|^v(yi, ■■■,vk) - 



r if K 

liminf / |^ Y (yi, ■ ••,yjf) - TT $n(fk)l TT ^K-2. h (Vk)dy > 



fe=l fc=l 

With the Cauchy-Schwarz inequality, we get : 

if if 



liminf / |^(yi, . . . , y K ) ~ TT $Y k (yk)\ 2 TT ^K 2 , h {yk)dy > 

iV — >+oc /;r 

■ /t/ fe=l fe=l 



Indeed, 



if if 



/ IV>Y(yi,-- - ,yif) - n^( yfc )l 2 Y[^2, h (yk)dy 

J U I. — i I. — i 



fe=i fe=i 
Ju \$y(yi, ■ ■ ■ ,Vk) ~ Uk=i $Y k (Vk)\Uk=i ^K. 2 , h {yk)dy 



JuT[k=i^ 2 , h {yk)dy 
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F U-statistics decomposition 



Using the concept of U-statistics, Q is expressed in a different way: 
Q = U{ + U' 2 - 2U' 3 where 



C% K {2K)\ 6« 



bft\ b$, b$ are random variables such that the limit of E[b^] 2 is equal to 
zero if A tends to infinity. 

And Ui, U2 and U3 are the U-statistics associated to U[, U' 2 and U$ (the 
exact formulas are given at the end of this part). 

In the following lemmas, we deduce the exact formula of U\, U2, U3 and b^ , 
bf and bf: 

Lemma F.l Let us decompose the estimation of E [tt~y 00] in terms of U- 
statistics: 



U{(Y(l),...,Y(N)):=E[9v(Y)] = 1 £ 1 £ W " ^ 



N , N K 

N ^ N ^ 11 \ 

n—1 m— 1 fc— 1 



A VA 



where 



i<i<j<N fe=i 

and 

if 



Lemma F. 2 Lei its define the set S2 = {(ii, ■ ■ ■ ,iK,ji, ■ ■ ■ > iif)|VA;, 1 < fc < 
2_fT, 1 < ifc < A, 1 < jfe < A}. T/ie second term in the expression of Q cor- 
responds to the estimation of JlfcLi E [7Ty fc and is written in terms of U- 
statistics as: 
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Yfc(n) - Yfc(m) 



U^Y(l),...,Y(N)):=f[E[ny k (Y k )} = f[ 1 £ 1 £ /C a , fc 



TV ^ TV ^ ' V cry, 



C 2 N K (2K)l 
TV 2 * U2 + 

where 

*™ Y < w » = J v ( iv- 1 )... 1 (iv- 2g+1 ) £n^ ( rtfo) -; tbt) ) 

where Sf is the subset of S2 of 2K- dimensional elements whose all components 
are different from one another and 

1(2) _ 1 \- TT v ( Y k{ik) - Y k {j k ) 



b " ~ vrvTv(^)^n^ 

sr fc=i 



and 5 2 = 5*2 \ £"2 > that is, at least two components of each element are equal. 
Lemma F.3 Let us define the set S3 = {(ii, ■ ■ ■ ,ik, J ) | Vfc , 1 < fc < K, 1 < i& < 



TV, 1 < j < TV}. Finally, the estimation of E HfcLi ""Vj 



, is expressed as: 



K N K N / v ( \ v I \ 

U^Y(l),...,Y(N)):=E[l[9 Yk (Y k )} = ^UttY,***' 



TV ^— ' 11 TV ^— ' ' V cr K 

fc=l n=l fc=l m=l v fc 

o ^' +1 (A-+l)! &£) 

- 2 tv^+i 1/3 + 7n 



where 



fc = l 

1S3 is i/ie subset of S3 of (K+l) -dimensional elements whose all components are 
different from one another and 

, (3) 1 vrrr f ^fcfa) - y fc(j) 



™ (A ' _1) sr fc=i ' V CTyfc 

and S 1 ^ = 5*3 \ S 1 ^, i/iai is, at least two components of each element are equal. 
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G Proof of the computation of variance 



Using the decomposition of Q in terms of U statistics, and following the com- 
putation of the variance given in [§], the variance of Q is computed. 

H Proof of lemma 13.21 

Using the Chebychev inequality, we deduce : 

1 - PaAQ < 9a) = PbAQ >q«)>l- 

q a - Q 

Thanks to the asymptotic properties of the variance, the right hand side of 
the above expression tends to 1 when the sample size goes to infinity. 



References 

[1] S. Achard, D.T. Pham, and C. Jutten. Quadratic dependence measure for 
non linear blind souces separation. In Proc. Int. Workshop ICA2003, pages 
263-268, Apr. 2003. 

[2] J.R. Blum, J. Kiefer, and M. Rosenblatt. Distribution free tests of inde- 
pendence based on the sample distribution functions. Ann. Math. Stat., 
32:485-498, 1961. 

[3] J.F. Cardoso. Blind signal separation : Statistical principles. Proceedings 

IEEE, 86(10):2009-2025, Oct. 1998. 
[4] A. Chen and P. J. Bickel. Consistent independent component analysis and 

prcwhitcning. IEEE Trans, on Signal Processing, 53(10):3625-3632, 2005. 
[5] P. Comon. Independent component analysis, a new concept ? Signal 

Processing, 3(36):287-314, Apr. 1994. 
[6] S. Csorgo. Limit behaviour of the empirical characteristic function. The 

Annals of Probability, 9(1):130-144, 1981. 
[7] J. Eriksson and V. Koivuncn. Characteristic function based independent 

component analysis. Signal Processing, 83:2195-2208, 2003. 
[8] A. Feuerverger. A consistent test for bivariate dependence. International 

Statistical Review, 61(3):419-433, 1993. 
[9] W. Hocffding. A class of statistics with asymptotically normal distribution. 

Ann. Math. Stat, 19:293-325, 1948. 
[10] W. Hocffding. A non-parametric test of independence. Ann. Math. Stat., 

19:546-557, 1948. 

[11] A. Kankaincn. Consistent testing of total independence based on empirical 
characteristic functions. PhD thesis, University of Jyvaskyla, 1995. 

[12] M. Rosenblatt. A quadratic measure of deviation of two-dimensional den- 
sity estimates and a test of independence. The Annals of Statistics, 3(1):1- 
14, 1975. 



17 



[13] A. Renyi. Calcul des probability's avec un appendice sur la theorie de 

I 'information. Editions Jaqucs Gabay, 1966. 
[14] V. Vapnik. The nature of statistical Learning Theory. Springer, 1998. 



18 



